Corpus Creation and Initial SMT Experiments between Spanish and Shipibo-konibo

نویسندگان

  • Ana Paula Galarreta
  • H. Andrés Melgar S.
  • Arturo Oncevay-Marcos
چکیده

In this paper, we present the first attempts to develop a machine translation (MT) system between Spanish and Shipibo-konibo (es-shp). There are very few digital texts written in Shipibo-konibo and even less bilingual texts that can be aligned, hence we had to create a parallel corpus using both bilingual and monolingual texts. We will describe how this corpus was made, as well as the process we followed to improve the quality of the sentences used to build a statistical MT model or SMT. The results obtained surpassed the baseline proposed (dictionary based) and made a promising result for further development considering the size of corpus used. Finally, it is expected that this MT system can be reinforced with the use of additional linguistic rules and automatic language processing functions that are being implemented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High Prevalence of Human T-Lymphotropic Virus Infection in Indigenous Women from the Peruvian Amazon

BACKGROUND In an earlier study, we detected an association between human T-cell lymphotropic virus (HTLV) infection and cervical human papillomavirus (HPV) in indigenous Amazonian Peruvian women of the Shipibo-Konibo ethnic group. As both HTLV and HPV can be transmitted sexually, we now report a population-based study examining the prevalence and risk factors for HTLV-1 and HTLV-2 infection in ...

متن کامل

Association between Human Papillomavirus and Human T-Lymphotropic Virus in Indigenous Women from the Peruvian Amazon

BACKGROUND No association between the Human T-cell lymphotropic virus (HTLV), an oncogenic virus that alters host immunity, and the Human Papillomavirus (HPV) has previously been reported. Examining the association between these two viruses may permit the identification of a population at increased risk for developing cervical cancer. METHODS AND FINDINGS Between July 2010 and February 2011, ...

متن کامل

Spell-Checking based on Syllabification and Character-level Graphs for a Peruvian Agglutinative Language

There are several native languages in Peru which are mostly agglutinative. These languages are transmitted from generation to generation mainly in oral form, causing different forms of writing across different communities. For this reason, there are recent efforts to standardize the spelling in the written texts, and it would be beneficial to support these tasks with an automatic tool such as a...

متن کامل

Cultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis

This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...

متن کامل

Evaluating Indirect Strategies for Chinese - Spanish Statistical Machine Translation: Extended Abstract

Although, Chinese and Spanish are two of the most spoken languages in the world, not much research has been done in machine translation for this language pair. This paper focuses on investigating the state-of-the-art of Chinese-to-Spanish statistical machine translation (Smt), which nowadays is one of the most popular approaches to machine translation. For this purpose, we report details of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017